Goto

Collaborating Authors

 Coleman County


Efficient Sketching and Nearest Neighbor Search Algorithms for Sparse Vector Sets

Bruch, Sebastian, Nardini, Franco Maria, Rulli, Cosimo, Venturini, Rossano

arXiv.org Artificial Intelligence

Sparse embeddings of data form an attractive class due to their inherent interpretability: Every dimension is tied to a term in some vocabulary, making it easy to visually decipher the latent space. Sparsity, however, poses unique challenges for Approximate Nearest Neighbor Search (ANNS) which finds, from a collection of vectors, the k vectors closest to a query. To encourage research on this underexplored topic, sparse ANNS featured prominently in a BigANN Challenge at NeurIPS 2023, where approximate algorithms were evaluated on large benchmark datasets by throughput and accuracy. In this work, we introduce a set of novel data structures and algorithmic methods, a combination of which leads to an elegant, effective, and highly efficient solution to sparse ANNS. Our contributions range from a theoretically-grounded sketching algorithm for sparse vectors to reduce their effective dimensionality while preserving inner product-induced ranks; a geometric organization of the inverted index; and the blending of local and global information to improve the efficiency and efficacy of ANNS. Empirically, our final algorithm, dubbed Seismic, reaches sub-millisecond per-query latency with high accuracy on a large-scale benchmark dataset using a single CPU.


Branched Broomrape Detection in Tomato Farms Using Satellite Imagery and Time-Series Analysis

Narimani, Mohammadreza, Pourreza, Alireza, Moghimi, Ali, Farajpoor, Parastoo, Jafarbiglu, Hamid, Mesgaran, Mohsen

arXiv.org Artificial Intelligence

Branched broomrape (Phelipanche ramosa (L.) Pomel) is a chlorophyll-deficient parasitic plant that threatens tomato production by extracting nutrients from the host, with reported yield losses up to 80 percent. Its mostly subterranean life cycle and prolific seed production (more than 200,000 seeds per plant, viable for up to 20 years) make early detection essential. We present an end-to-end pipeline that uses Sentinel-2 imagery and time-series analysis to identify broomrape-infested tomato fields in California. Regions of interest were defined from farmer-reported infestations, and images with less than 10 percent cloud cover were retained. We processed 12 spectral bands and sun-sensor geometry, computed 20 vegetation indices (e.g., NDVI, NDMI), and derived five plant traits (Leaf Area Index, Leaf Chlorophyll Content, Canopy Chlorophyll Content, Fraction of Absorbed Photosynthetically Active Radiation, and Fractional Vegetation Cover) using a neural network calibrated with ground-truth and synthetic data. Trends in Canopy Chlorophyll Content delineated transplanting-to-harvest periods, and phenology was aligned using growing degree days. Vegetation pixels were segmented and used to train a Long Short-Term Memory (LSTM) network on 18,874 pixels across 48 growing-degree-day time points. The model achieved 88 percent training accuracy and 87 percent test accuracy, with precision 0.86, recall 0.92, and F1 0.89. Permutation feature importance ranked NDMI, Canopy Chlorophyll Content, FAPAR, and a chlorophyll red-edge index as most informative, consistent with the physiological effects of infestation. Results show the promise of satellite-driven time-series modeling for scalable detection of parasitic stress in tomato farms.


Neuro-Symbolic AI for Cybersecurity: State of the Art, Challenges, and Opportunities

Hakim, Safayat Bin, Adil, Muhammad, Velasquez, Alvaro, Xu, Shouhuai, Song, Houbing Herbert

arXiv.org Artificial Intelligence

Traditional Artificial Intelligence (AI) approaches in cybersecurity exhibit fundamental limitations: inadequate conceptual grounding leading to non-robustness against novel attacks; limited instructibility impeding analyst-guided adaptation; and misalignment with cybersecurity objectives. Neuro-Symbolic (NeSy) AI has emerged with the potential to revolutionize cybersecurity AI. However, there is no systematic understanding of this emerging approach. These hybrid systems address critical cybersecurity challenges by combining neural pattern recognition with symbolic reasoning, enabling enhanced threat understanding while introducing concerning autonomous offensive capabilities that reshape threat landscapes. In this survey, we systematically characterize this field by analyzing 127 publications spanning 2019-July 2025. We introduce a Grounding-Instructibility-Alignment (G-I-A) framework to evaluate these systems, focusing on both cyber defense and cyber offense across network security, malware analysis, and cyber operations. Our analysis shows advantages of multi-agent NeSy architectures and identifies critical implementation challenges including standardization gaps, computational complexity, and human-AI collaboration requirements that constrain deployment. We show that causal reasoning integration is the most transformative advancement, enabling proactive defense beyond correlation-based approaches. Our findings highlight dual-use implications where autonomous systems demonstrate substantial capabilities in zero-day exploitation while achieving significant cost reductions, altering threat dynamics. We provide insights and future research directions, emphasizing the urgent need for community-driven standardization frameworks and responsible development practices that ensure advancement serves defensive cybersecurity objectives while maintaining societal alignment.


Meta-training of diffractive meta-neural networks for super-resolution direction of arrival estimation

Yang, Songtao, Gao, Sheng, Wu, Chu, Zhao, Zejia, Zhang, Haiou, Lin, Xing

arXiv.org Artificial Intelligence

Diffractive neural networks leverage the high-dimensional characteristics of electromagnetic (EM) fields for high-throughput computing. However, the existing architectures face challenges in integrating large-scale multidimensional metasurfaces with precise network training and haven't utilized multidimensional EM field coding scheme for super-resolution sensing. Here, we propose diffractive meta-neural networks (DMNNs) for accurate EM field modulation through metasurfaces, which enable multidimensional multiplexing and coding for multi-task learning and high-throughput super-resolution direction of arrival estimation. DMNN integrates pre-trained mini-metanets to characterize the amplitude and phase responses of meta-atoms across different polarizations and frequencies, with structure parameters inversely designed using the gradient-based meta-training. For wide-field super-resolution angle estimation, the system simultaneously resolves azimuthal and elevational angles through x and y-polarization channels, while the interleaving of frequency-multiplexed angular intervals generates spectral-encoded optical super-oscillations to achieve full-angle high-resolution estimation. Post-processing lightweight electronic neural networks further enhance the performance. Experimental results validate that a three-layer DMNN operating at 27 GHz, 29 GHz, and 31 GHz achieves $\sim7\times$ Rayleigh diffraction-limited angular resolution (0.5$^\circ$), a mean absolute error of 0.048$^\circ$ for two incoherent targets within a $\pm 11.5^\circ$ field of view, and an angular estimation throughput an order of magnitude higher (1917) than that of existing methods. The proposed architecture advances high-dimensional photonic computing systems by utilizing inherent high-parallelism and all-optical coding methods for ultra-high-resolution, high-throughput applications.


Modeling Partially Observed Nonlinear Dynamical Systems and Efficient Data Assimilation via Discrete-Time Conditional Gaussian Koopman Network

Chen, Chuanqi, Wang, Zhongrui, Chen, Nan, Wu, Jin-Long

arXiv.org Artificial Intelligence

A discrete-time conditional Gaussian Koopman network (CGKN) is developed in this work to learn surrogate models that can perform efficient state forecast and data assimilation (DA) for high-dimensional complex dynamical systems, e.g., systems governed by nonlinear partial differential equations (PDEs). Focusing on nonlinear partially observed systems that are common in many engineering and earth science applications, this work exploits Koopman embedding to discover a proper latent representation of the unobserved system states, such that the dynamics of the latent states are conditional linear, i.e., linear with the given observed system states. The modeled system of the observed and latent states then becomes a conditional Gaussian system, for which the posterior distribution of the latent states is Gaussian and can be efficiently evaluated via analytical formulae. The analytical formulae of DA facilitate the incorporation of DA performance into the learning process of the modeled system, which leads to a framework that unifies scientific machine learning (SciML) and data assimilation. The performance of discrete-time CGKN is demonstrated on several canonical problems governed by nonlinear PDEs with intermittency and turbulent features, including the viscous Burgers' equation, the Kuramoto-Sivashinsky equation, and the 2-D Navier-Stokes equations, with which we show that the discrete-time CGKN framework achieves comparable performance as the state-of-the-art SciML methods in state forecast and provides efficient and accurate DA results. The discrete-time CGKN framework also serves as an example to illustrate unifying the development of SciML models and their other outer-loop applications such as design optimization, inverse problems, and optimal control.


Intrinsic Explainability of Multimodal Learning for Crop Yield Prediction

Najjar, Hiba, Pathak, Deepak, Nuske, Marlon, Dengel, Andreas

arXiv.org Artificial Intelligence

Multimodal learning enables various machine learning tasks to benefit from diverse data sources, effectively mimicking the interplay of different factors in real-world applications, particularly in agriculture. While the heterogeneous nature of involved data modalities may necessitate the design of complex architectures, the model interpretability is often overlooked. In this study, we leverage the intrinsic explainability of Transformer-based models to explain multimodal learning networks, focusing on the task of crop yield prediction at the subfield level. The large datasets used cover various crops, regions, and years, and include four different input modalities: multispectral satellite and weather time series, terrain elevation maps and soil properties. Based on the self-attention mechanism, we estimate feature attributions using two methods, namely the Attention Rollout (AR) and Generic Attention (GA), and evaluate their performance against Shapley-based model-agnostic estimations, Shapley Value Sampling (SVS). Additionally, we propose the Weighted Modality Activation (WMA) method to assess modality attributions and compare it with SVS attributions. Our findings indicate that Transformer-based models outperform other architectures, specifically convolutional and recurrent networks, achieving R2 scores that are higher by 0.10 and 0.04 at the subfield and field levels, respectively. AR is shown to provide more robust and reliable temporal attributions, as confirmed through qualitative and quantitative evaluation, compared to GA and SVS values. Information about crop phenology stages was leveraged to interpret the explanation results in the light of established agronomic knowledge. Furthermore, modality attributions revealed varying patterns across the two methods compared.[...]


From Field to Drone: Domain Drift Tolerant Automated Multi-Species and Damage Plant Semantic Segmentation for Herbicide Trials

Picon, Artzai, Eguskiza, Itziar, Mugica, Daniel, Romero, Javier, Jimenez, Carlos Javier, White, Eric, Do-Lago-Junqueira, Gabriel, Klukas, Christian, Navarra-Mestre, Ramon

arXiv.org Artificial Intelligence

Field trials are vital in herbicide research and development to assess effects on crops and weeds under varied conditions. Traditionally, evaluations rely on manual visual assessments, which are time-consuming, labor-intensive, and subjective. Automating species and damage identification is challenging due to subtle visual differences, but it can greatly enhance efficiency and consistency. We present an improved segmentation model combining a general-purpose self-supervised visual model with hierarchical inference based on botanical taxonomy. Trained on a multi-year dataset (2018-2020) from Germany and Spain using digital and mobile cameras, the model was tested on digital camera data (year 2023) and drone imagery from the United States, Germany, and Spain (year 2024) to evaluate robustness under domain shift. This cross-device evaluation marks a key step in assessing generalization across platforms of the model. Our model significantly improved species identification (F1-score: 0.52 to 0.85, R-squared: 0.75 to 0.98) and damage classification (F1-score: 0.28 to 0.44, R-squared: 0.71 to 0.87) over prior methods. Under domain shift (drone images), it maintained strong performance with moderate degradation (species: F1-score 0.60, R-squared 0.80; damage: F1-score 0.41, R-squared 0.62), where earlier models failed. These results confirm the model's robustness and real-world applicability. It is now deployed in BASF's phenotyping pipeline, enabling large-scale, automated crop and weed monitoring across diverse geographies.


Disaster Informatics after the COVID-19 Pandemic: Bibliometric and Topic Analysis based on Large-scale Academic Literature

Tran, Ngan, Chen, Haihua, Cleveland, Ana, Zhou, Yuhan

arXiv.org Artificial Intelligence

This study presents a comprehensive bibliometric and topic analysis of the disaster informatics literature published between January 2020 to September 2022. Leveraging a large-scale corpus and advanced techniques such as pre-trained language models and generative AI, we identify the most active countries, institutions, authors, collaboration networks, emergent topics, patterns among the most significant topics, and shifts in research priorities spurred by the COVID-19 pandemic. Our findings highlight (1) countries that were most impacted by the COVID-19 pandemic were also among the most active, with each country having specific research interests, (2) countries and institutions within the same region or share a common language tend to collaborate, (3) top active authors tend to form close partnerships with one or two key partners, (4) authors typically specialized in one or two specific topics, while institutions had more diverse interests across several topics, and (5) the COVID-19 pandemic has influenced research priorities in disaster informatics, placing greater emphasis on public health. We further demonstrate that the field is converging on multidimensional resilience strategies and cross-sectoral data-sharing collaborations or projects, reflecting a heightened awareness of global vulnerability and interdependency. Collecting and quality assurance strategies, data analytic practices, LLM-based topic extraction and summarization approaches, and result visualization tools can be applied to comparable datasets or solve similar analytic problems. By mapping out the trends in disaster informatics, our analysis offers strategic insights for policymakers, practitioners, and scholars aiming to enhance disaster informatics capacities in an increasingly uncertain and complex risk landscape.


MoistureMapper: An Autonomous Mobile Robot for High-Resolution Soil Moisture Mapping at Scale

Rose, Nathaniel, Chuang, Hannah, Andrade-Rodriguez, Manuel A, Parashar, Rishi, Or, Dani, Maini, Parikshit

arXiv.org Artificial Intelligence

-- Soil moisture is a quantity of interest in many application areas including agriculture and climate modeling. Existing methods are not suitable for scale applications due to large deployment costs in high-resolution sensing applications such as for variable irrigation. In this work, we design, build and field deploy an autonomous mobile robot, MoistureMapper, for soil moisture sensing. The robot is equipped with Time Domain Reflectometry (TDR) sensors and a direct push drill mechanism for deploying the sensor to measure volumetric water content in the soil. Additionally, we implement and evaluate multiple adaptive sampling strategies based on a Gaussian Process based modeling to build a spatial mapping of moisture distribution in the soil. The adaptive sampling approach outperforms a greedy benchmark approach and results in up to 30% reduction in travel distance and 5% reduction in variance in the reconstructed moisture maps. Link to video showing field experiments: https://youtu.be/S4bJ4tRzObg


Mic-hackathon 2024: Hackathon on Machine Learning for Electron and Scanning Probe Microscopy

Pratiush, Utkarsh, Houston, Austin, Barakati, Kamyar, Raghavan, Aditya, Yoon, Dasol, KP, Harikrishnan, Baraissov, Zhaslan, Ma, Desheng, Welborn, Samuel S., Jakowski, Mikolaj, Barhorst, Shawn-Patrick, Pattison, Alexander J., Manganaris, Panayotis, Madugula, Sita Sirisha, Ayyagari, Sai Venkata Gayathri, Kennedy, Vishal, Bulanadi, Ralph, Wang, Michelle, Pang, Kieran J., Addison-Smith, Ian, Menacho, Willy, Guzman, Horacio V., Kiefer, Alexander, Furth, Nicholas, Kolev, Nikola L., Petrov, Mikhail, Liu, Viktoriia, Ilyev, Sergey, Rairao, Srikar, Rodani, Tommaso, Pinto-Huguet, Ivan, Chen, Xuli, Cruañes, Josep, Torrens, Marta, Pomar, Jovan, Su, Fanzhi, Vedanti, Pawan, Lyu, Zhiheng, Wang, Xingzhi, Yao, Lehan, Taqieddin, Amir, Laskowski, Forrest, Yin, Xiangyu, Shao, Yu-Tsun, Fein-Ashley, Benjamin, Jiang, Yi, Kumar, Vineet, Mishra, Himanshu, Paul, Yogesh, Bazgir, Adib, Madugula, Rama chandra Praneeth, Zhang, Yuwen, Omprakash, Pravan, Huang, Jian, Montufar-Morales, Eric, Chawla, Vivek, Sethi, Harshit, Huang, Jie, Kurki, Lauri, Guinan, Grace, Salvador, Addison, Ter-Petrosyan, Arman, Van Winkle, Madeline, Spurgeon, Steven R., Narasimha, Ganesh, Wu, Zijie, Liu, Richard, Liu, Yongtao, Slautin, Boris, Lupini, Andrew R, Vasudevan, Rama, Duscher, Gerd, Kalinin, Sergei V.

arXiv.org Artificial Intelligence

Microscopy is a primary source of information on materials structure and functionality at nanometer and atomic scales. The data generated is often well-structured, enriched with metadata and sample histories, though not always consistent in detail or format. The adoption of Data Management Plans (DMPs) by major funding agencies promotes preservation and access. However, deriving insights remains difficult due to the lack of standardized code ecosystems, benchmarks, and integration strategies. As a result, data usage is inefficient and analysis time is extensive. In addition to post-acquisition analysis, new APIs from major microscope manufacturers enable real-time, ML-based analytics for automated decision-making and ML-agent-controlled microscope operation. Yet, a gap remains between the ML and microscopy communities, limiting the impact of these methods on physics, materials discovery, and optimization. Hackathons help bridge this divide by fostering collaboration between ML researchers and microscopy experts. They encourage the development of novel solutions that apply ML to microscopy, while preparing a future workforce for instrumentation, materials science, and applied ML. This hackathon produced benchmark datasets and digital twins of microscopes to support community growth and standardized workflows. All related code is available at GitHub: https://github.com/KalininGroup/Mic-hackathon-2024-codes-publication/tree/1.0.0.1